AITopics | hate speech

Collaborating Authors

hate speech

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Seeing Hate Differently: Hate Subspace Modeling for Culture-Aware Hate Speech Detection

Cai, Weibin, Zafarani, Reza

arXiv.org Artificial IntelligenceOct-17-2025

Hate speech detection has been extensively studied, yet existing methods often overlook a real-world complexity: training labels are biased, and interpretations of what is considered hate vary across individuals with different cultural backgrounds. We first analyze these challenges, including data sparsity, cultural entanglement, and ambiguous labeling. To address them, we propose a culture-aware framework that constructs individuals' hate subspaces. To alleviate data sparsity, we model combinations of cultural attributes. For cultural entanglement and ambiguous labels, we use label propagation to capture distinctive features of each combination. Finally, individual hate subspaces, which in turn can further enhance classification performance. Experiments show our method outperforms state-of-the-art by 1.05\% on average across all metrics.

background, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.13837

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)

Add feedback

WATCHED: A Web AI Agent Tool for Combating Hate Speech by Expanding Data

Piot, Paloma, Sánchez, Diego, Parapar, Javier

arXiv.org Artificial IntelligenceSep-3-2025

Online harms are a growing problem in digital spaces, putting user safety at risk and reducing trust in social media platforms. One of the most persistent forms of harm is hate speech. To address this, we need tools that combine the speed and scale of automated systems with the judgment and insight of human moderators. These tools should not only find harmful content but also explain their decisions clearly, helping to build trust and understanding. In this paper, we present WATCHED, a chatbot designed to support content moderators in tackling hate speech. The chatbot is built as an Artificial Intelligence Agent system that uses Large Language Models along with several specialised tools. It compares new posts with real examples of hate speech and neutral content, uses a BERT-based classifier to help flag harmful messages, looks up slang and informal language using sources like Urban Dictionary, generates chain-of-thought reasoning, and checks platform guidelines to explain and support its decisions. This combination allows the chatbot not only to detect hate speech but to explain why content is considered harmful, grounded in both precedent and policy. Experimental results show that our proposed method surpasses existing state-of-the-art methods, reaching a macro F1 score of 0.91. Designed for moderators, safety teams, and researchers, the tool helps reduce online harms by supporting collaboration between AI and human oversight.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.01379

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Government (0.88)
Information Technology (0.69)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Debunking with Dialogue? Exploring AI-Generated Counterspeech to Challenge Conspiracy Theories

Lisker, Mareike, Gottschalk, Christina, Mihaljević, Helena

arXiv.org Artificial IntelligenceAug-4-2025

Counterspeech is a key strategy against harmful online content, but scaling expert-driven efforts is challenging. Large Language Models (LLMs) present a potential solution, though their use in countering conspiracy theories is under-researched. Unlike for hate speech, no datasets exist that pair conspiracy theory comments with expert-crafted counterspeech. We address this gap by evaluating the ability of GPT-4o, Llama 3, and Mistral to effectively apply counterspeech strategies derived from psychological research provided through structured prompts. Our results show that the models often generate generic, repetitive, or superficial results. Additionally, they over-acknowledge fear and frequently hallucinate facts, sources, or figures, making their prompt-based use in practical applications problematic.

large language model, llama 3, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2504.16604

Country:

North America > United States (1.00)
Europe (1.00)
Asia (0.93)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media (1.00)
Law (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Automated Safety Evaluations Across 20 Large Language Models: The Aymara LLM Risk and Responsibility Matrix

Contreras, Juan Manuel

arXiv.org Artificial IntelligenceJul-22-2025

As large language models (LLMs) become increasingly integrated into real-world applications, scalable and rigorous safety evaluation is essential. This paper introduces Aymara AI, a programmatic platform for generating and administering customized, policy-grounded safety evaluations. Aymara AI transforms natural-language safety policies into adversarial prompts and scores model responses using an AI-based rater validated against human judgments. We demonstrate its capabilities through the Aymara LLM Risk and Responsibility Matrix, which evaluates 20 commercially available LLMs across 10 real-world safety domains. Results reveal wide performance disparities, with mean safety scores ranging from 86.2% to 52.4%. While models performed well in well-established safety domains such as Misinformation (mean = 95.7%), they consistently failed in more complex or underspecified domains, notably Privacy & Impersonation (mean = 24.3%). Analyses of Variance confirmed that safety scores differed significantly across both models and domains (p < .05). These findings underscore the inconsistent and context-dependent nature of LLM safety and highlight the need for scalable, customizable tools like Aymara AI to support responsible AI development and oversight.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2507.14719

Country: North America > United States (0.93)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Modular Taxonomy for Hate Speech Definitions and Its Impact on Zero-Shot LLM Classification Performance

Melis, Matteo, Lapesa, Gabriella, Assenmacher, Dennis

arXiv.org Artificial IntelligenceJun-24-2025

Detecting harmful content is a crucial task in the landscape of NLP applications for Social Good, with hate speech being one of its most dangerous forms. But what do we mean by hate speech, how can we define it, and how does prompting different definitions of hate speech affect model performance? The contribution of this work is twofold. At the theoretical level, we address the ambiguity surrounding hate speech by collecting and analyzing existing definitions from the literature. We organize these definitions into a taxonomy of 14 Conceptual Elements-building blocks that capture different aspects of hate speech definitions, such as references to the target of hate (individual or groups) or of the potential consequences of it. At the experimental level, we employ the collection of definitions in a systematic zero-shot evaluation of three LLMs, on three hate speech datasets representing different types of data (synthetic, human-in-the-loop, and real-world). We find that choosing different definitions, i.e., definitions with a different degree of specificity in terms of encoded elements, impacts model performance, but this effect is not consistent across all architectures.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.18576

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.70)
Government > Regional Government (0.69)
Social Sector (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

A Comparative Analysis of Ethical and Safety Gaps in LLMs using Relative Danger Coefficient

Tereshchenko, Yehor, Hämäläinen, Mika

arXiv.org Artificial IntelligenceMay-9-2025

Artificial Intelligence (AI) and Large Language Models (LLMs) have rapidly evolved in recent years, showcasing remarkable capabilities in natural language understanding and generation. However, these advancements also raise critical ethical questions regarding safety, potential misuse, discrimination and overall societal impact. This article provides a comparative analysis of the ethical performance of various AI models, including the brand new DeepSeek-V3(R1 with reasoning and without), various GPT variants (4o, 3.5 Turbo, 4 Turbo, o1/o3 mini) and Gemini (1.5 flash, 2.0 flash and 2.0 flash exp) and highlights the need for robust human oversight, especially in situations with high stakes. Furthermore, we present a new metric for calculating harm in LLMs called Relative Danger Coefficient (RDC).

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.04654

Genre: Research Report (0.82)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Health & Medicine (1.00)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLMs for Translation: Historical, Low-Resourced Languages and Contemporary AI Models

Tekgurler, Merve

arXiv.org Artificial IntelligenceMar-14-2025

Large Language Models (LLMs) have demonstrated remarkable adaptability in performing various tasks, including machine translation (MT), without explicit training. Models such as OpenAI's GPT-4 and Google's Gemini are frequently evaluated on translation benchmarks and utilized as translation tools due to their high performance. This paper examines Gemini's performance in translating an 18th-century Ottoman Turkish manuscript, Prisoner of the Infidels: The Memoirs of Osman Agha of Timisoara, into English. The manuscript recounts the experiences of Osman Agha, an Ottoman subject who spent 11 years as a prisoner of war in Austria, and includes his accounts of warfare and violence. Our analysis reveals that Gemini's safety mechanisms flagged between 14 and 23 percent of the manuscript as harmful, resulting in untranslated passages. These safety settings, while effective in mitigating potential harm, hinder the model's ability to provide complete and accurate translations of historical texts. Through real historical examples, this study highlights the inherent challenges and limitations of current LLM safety implementations in the handling of sensitive and context-rich materials. These real-world instances underscore potential failures of LLMs in contemporary translation scenarios, where accurate and comprehensive translations are crucial-for example, translating the accounts of modern victims of war for legal proceedings or humanitarian documentation.

gemini, manuscript, translation, (15 more...)

arXiv.org Artificial Intelligence

2503.11898

Country:

Europe > Romania > Vest Development Region > Timiș County > Timișoara (0.24)
Europe > Austria (0.24)
North America > United States > New York (0.04)
(13 more...)

Genre: Research Report (1.00)

Industry:

Law (0.88)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Hate Speech and Sentiment of YouTube Video Comments From Public and Private Sources Covering the Israel-Palestine Conflict

Hofmann, Simon, Sommermann, Christoph, Kraus, Mathias, Zschech, Patrick, Rosenberger, Julian

arXiv.org Artificial IntelligenceMar-3-2025

This study explores the prevalence of hate speech (HS) and sentiment in YouTube video comments concerning the Israel-Palestine conflict by analyzing content from both public and private news sources. The research involved annotating 4983 comments for HS and sentiments (neutral, pro-Israel, and pro-Palestine). Subsequently, machine learning (ML) models were developed, demonstrating robust predictive capabilities with area under the receiver operating characteristic (AUROC) scores ranging from 0.83 to 0.90. These models were applied to the extracted comment sections of YouTube videos from public and private sources, uncovering a higher incidence of HS in public sources (40.4%) compared to private sources (31.6%). Sentiment analysis revealed a predominantly neutral stance in both source types, with more pronounced sentiments towards Israel and Palestine observed in public sources. This investigation highlights the dynamic nature of online discourse surrounding the Israel-Palestine conflict and underscores the potential of moderating content in a politically charged environment.

israel-palestine conflict, sentiment, sentiment analysis, (13 more...)

arXiv.org Artificial Intelligence

2503.10648

Country:

Asia > Middle East > Israel (1.00)
Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.06)
North America > United States > Hawaii (0.05)
(7 more...)

Genre: Research Report > New Finding (0.68)

Industry: Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages

Muhammad, Shamsuddeen Hassan, Abdulmumin, Idris, Ayele, Abinew Ali, Adelani, David Ifeoluwa, Ahmad, Ibrahim Said, Aliyu, Saminu Mohammad, Onyango, Nelson Odhiambo, Wanzare, Lilian D. A., Rutunda, Samuel, Aliyu, Lukman Jibril, Alemneh, Esubalew, Hourrane, Oumaima, Gebremichael, Hagos Tesfahun, Ismail, Elyas Abdi, Beloucif, Meriem, Jibril, Ebrahim Chekol, Bukula, Andiswa, Mabuya, Rooweither, Osei, Salomey, Oppong, Abigail, Belay, Tadesse Destaw, Guge, Tadesse Kebede, Asfaw, Tesfa Tegegne, Chukwuneke, Chiamaka Ijeoma, Röttger, Paul, Yimam, Seid Muhie, Ousidhoum, Nedjma

arXiv.org Artificial IntelligenceJan-15-2025

Hate speech and abusive language are global phenomena that need socio-cultural background knowledge to be understood, identified, and moderated. However, in many regions of the Global South, there have been several documented occurrences of (1) absence of moderation and (2) censorship due to the reliance on keyword spotting out of context. Further, high-profile individuals have frequently been at the center of the moderation process, while large and targeted hate speech campaigns against minorities have been overlooked. These limitations are mainly due to the lack of high-quality data in the local languages and the failure to include local communities in the collection, annotation, and moderation processes. To address this issue, we present AfriHate: a multilingual collection of hate speech and abusive language datasets in 15 African languages. Each instance in AfriHate is annotated by native speakers familiar with the local culture. We report the challenges related to the construction of the datasets and present various classification baseline results with and without using LLMs. The datasets, individual annotations, and hate speech and offensive language lexicons are available on https://github.com/AfriHate/AfriHate

computational linguistic, dataset, tweet, (15 more...)

arXiv.org Artificial Intelligence

2501.08284

Country:

Africa > East Africa (0.04)
Africa > West Africa (0.04)
Africa > Southern Africa (0.04)
(38 more...)

Genre: Research Report (0.81)

Industry:

Government (1.00)
Law > Civil Rights & Constitutional Law (0.48)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Digital Guardians: Can GPT-4, Perspective API, and Moderation API reliably detect hate speech in reader comments of German online newspapers?

Weber, Manuel, Huber, Moritz, Auch, Maximilian, Döschl, Alexander, Keller, Max-Emanuel, Mandl, Peter

arXiv.org Artificial IntelligenceJan-2-2025

In recent years, toxic content and hate speech have become widespread phenomena on the internet. Moderators of online newspapers and forums are now required, partly due to legal regulations, to carefully review and, if necessary, delete reader comments. This is a labor-intensive process. Some providers of large language models already offer solutions for automated hate speech detection or the identification of toxic content. These include GPT-4o from OpenAI, Jigsaw's (Google) Perspective API, and OpenAI's Moderation API. Based on the selected German test dataset HOCON34k, which was specifically created for developing tools to detect hate speech in reader comments of online newspapers, these solutions are compared with each other and against the HOCON34k baseline. The test dataset contains 1,592 annotated text samples. For GPT-4o, three different promptings are used, employing a Zero-Shot, One-Shot, and Few-Shot approach. The results of the experiments demonstrate that GPT-4o outperforms both the Perspective API and the Moderation API, and exceeds the HOCON34k baseline by approximately 5 percentage points, as measured by a combined metric of MCC and F2-score.

api, dataset, speech, (15 more...)

arXiv.org Artificial Intelligence

2501.01256

Country:

Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
South America > Bolivia (0.04)
Europe > Slovakia > Bratislava > Bratislava (0.04)

Genre:

Personal > Opinion (0.81)
Research Report > New Finding (0.67)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)

Add feedback